Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.1 - Check here for latest version

Documents to Data (Text Processing)

Synopsis

Generates a data set from documents.

Description

This operator generates a data set from a collection of documents. For each document in the collection, an example is added to the data set. The text contained in the document is stored in a nominal attribute. If a label or meta data are present associated with the documents, a label attribute or attribute for the meta data are created, respectively.

Input

  • documents (Collection)

    The documents port.

Output

  • example set (Data Table)

    The example set port.

Parameters

  • text_attributeThe name of the text attribute. Range:
  • label_attributeThe name of the label attribute. Range:
  • add_meta_informationIf checked, available meta information of the text like filename, date is added as attribute. Range:
  • datamanagementDetermines, how the data is represented internally. Range: